Fast unaligned cache access system and method

ABSTRACT

A cache unit multiple memory towers, which can be independently addressed. Cache lines are divided among multiple towers. Furthermore, physical lines of the memory towers are shared by multiple cache lines. Because each tower can be addressed independently and the cache lines are split among the towers, unaligned cache access can be performed. Furthermore, power can be conserved because not all the memory towers of the cache unit needs to be activated during some memory access operations.

FIELD OF THE INVENTION

The present invention relates to microprocessor systems, and moreparticularly to a memory access system for a microprocessor system toefficiently retrieve unaligned data.

BACKGROUND OF THE INVENTION

FIG. 1(a) is a simplified block diagram of a conventional microprocessorsystem 100 a having a central processing unit (CPU) 110 coupled to amemory system 120. CPU includes an address generator 112, a data aligner114 and various pipelines and execution units (not shown). Addressgenerator 112 provides a memory address ADDR to memory system 120.Memory address ADDR is used to activate a row of memory system 120. Ingeneral a memory address includes a row portion that forms a row addressfor memory system 120. The remaining bits of the memory addressdesignate a specific portion of the memory row. For clarity, thedescription herein assumes that the bottom row of memory system 120 hasa row address of 0. Each successive row has a row address that is onemore than the previous row. Furthermore, memory system 120 is describedas being 64 bits wide and is conceptually divided into 4 16 bit halfwords. CPU 110 uses data aligner 114 to load data from or store data tomemory system 120. Specifically, data aligner 114 couples a 64 bitinternal data bus I_DB to memory system 120 using four 16 bit data busesDB0, DB1, DB2, and DB3. Conceptually internal data bus I_DB containsfour 16 bit data half words that can be reordered through data aligner114.

CPU 110 may access memory system 120 with multiple store and loadinstructions of different data width. For example, CPU 110 may supportinstructions that work with 8, 16, 32, 64, 128, 256 or 512 bit datawidths. Furthermore, CPU 110 may support storing and loading of multipledata words simultaneously using a single access. For example, CPU 110may write four 16 bit data words simultaneously as a single 64 bitmemory access.

The ability to access data having different data widths may result inunaligned data. As illustrated in FIG. 1, memory system 120 containsdata sets A, B, C, D, and F. Each data set is separated as one or morehalf words (i.e., 16 bits wide) in memory system 120. For example, dataset A includes half words A1, A1, A2, and A3. Data set B includes halfword B0. Data set C includes half words C0 and C1. Data set D includeshalf words D0, D1, D2, and D3. Data set E includes half word E0 and E1.Data set F includes half words F1, F2, F3, and F4 (not shown). Data setA, which is located completely in row 0, is aligned data and can easilybe retrieved in one memory access. However, data set D is located inboth row 1 and row 2. To retrieve data set D, CPU 110 must access memorysystem 120 twice. First to retrieve half word D0 in row 1 and then toretrieve half words D1, D2, and D3 in row 2.

Because memory bandwidth is one of the main factors limiting theperformance of microprocessor system 100 a, requiring multiple memoryaccess to retrieve a single data set greatly decreases the performanceof microprocessor system 100 a. Replacing memory system 120 with a dualported memory can eliminate the need for two memory accesses. However,dual ported memories greatly increases silicon cost (i.e. area) of thememory system as well as the power consumption of the memory system.

Furthermore as illustrated in FIG. 1(b) some microprocessor systems suchas microprocessor system 100 b includes a cache 130 to increase memoryperformance. As is well known in the art, caches are generally smallfast memories, that store recently used data so that repeated access tothe data can be performed very quickly. In general when data is readfrom, or written to the main memory (i.e. memory system 120) a copy isalso saved in cache 130 along with the memory address of the data. Cache130 monitors subsequent reads and writes and determines whether therequested data is already in cache 130. When the data is already incache 130 (i.e. a cache hit) the data in cache 130 is used rather thanmemory system 120. Because data in cache 130 can be accessed faster thanmemory system 120 the performance of the overall system is improved.Furthermore, data is generally transferred between memory system 120 andcache 130 in a cache line, which is generally several times larger thana memory access of the CPU 110. Using large cache lines generallyimproves cache hit ratios because data that is in close proximity inmemory are generally used together. Furthermore, large cache linesimprove burst transfers on busses for write back and refilling. Whilecaches that support aligned access are straight forward and well knownin the art, caches supporting unaligned access have even larger problemsthan described above with respect to memory system 120, because thecache lines are larger and in general more lines are read at the sametime.

Hence there is a need for a method or system that provides fastunaligned access to a memory system without requiring high powerutilization or large silicon area.

SUMMARY

Accordingly, a microprocessor system in accordance with the presentinvention, uses a cache which includes multiple memory towers, eachhaving multiple way sub-towers. A cache line is divided across all thememory towers. Within each memory tower the data segments of a cacheline are stored in a single way sub-tower. However, each segment of thecache line is stored on a separate physical line within a set in the waytower. Specifically, a cache line includes a set of sequential datasegments, each of the first M successive data segment is placed in adifferent memory towers. The (M+x)th data segment goes into the samememory tower as the xth data segment. Because the memory towers receiveindependent addresses, different physical lines of each memory tower canbe accessed simultaneously to support unaligned data access in a singlememory access.

In one embodiment of the present invention, a cache unit includes afirst memory tower and a second memory tower. Each memory tower isincludes a first way sub-tower and a second way sub-tower. A cache lineof the cache unit would include a first set of data segments in thefirst way sub-tower of the first memory tower and a second set of datasegments in the first way sub-tower of the second memory tower. Eachdata segment of the first cache line in a particular way sub-tower islocated in a different physical line of the memory tower. Unalignedcache access is supported by activating the appropriate physical line ofthe different memory tower to provide the desired data segments. A dataaligner is used to realign the data segments to the proper order.

The present invention will be more fully understood in view of thefollowing description and drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1(a) is simplified block diagram of a conventional microprocessorsystem.

FIG. 1(b) is simplified block diagram of a conventional microprocessorsystem having a cache.

FIG. 2 is simplified block diagram of conventional cache unit.

FIG. 3 is a block diagram of a novel cache unit in accordance with oneembodiment of the present invention.

FIG. 4 is a block diagram of a novel tag unit in accordance with oneembodiment of the present invention.

FIG. 5 is a block diagram of a data aligner in accordance with oneembodiment of the present invention.

FIG. 6(a)-(d) illustrate the memory towers of cache unit in accordancewith one embodiment of the present invention.

FIG. 7 illustrates the arrangement of data segments in the memory towersof a cache unit in accordance with one embodiment of the presentinvention.

DETAILED DESCRIPTION

As explained above, conventional microprocessor systems do not provideadequate memory bandwidth for data sets stored in more than one row of amemory system. While using a dual port memory provides higher bandwidth,the cost in silicon area and power for the dual port memory preventswide spread use of dual port memories. Furthermore, dual ported memoryoperate at lower frequencies than single ported memories. Co-pendingU.S. patent application Ser. No. ______ (Attorney Docket No. INF-025),entitled “FAST UNALIGNED MEMORY ACCESS SYSTEM AND METHOD”, byOberlaender, et al., herein incorporated by reference, describes a multitowered memory system that allows retrieval or storage of a data set onmultiple rows of a memory system using a single memory access withoutthe detriments associated with a dual port memory system. The presentinvention describes a novel cache structure that supports unalignedaccesses for multi-towered memory systems.

FIG. 2 illustrates a conventional cache unit 200. Cache unit 200,includes a tag unit 210, a multiplexer 230 a first memory tower 220_T0and a second memory tower 220_T1, For clarity most caches describedherein are 2-way set associate caches. However, the principles of thepresent invention can be adapted by one skilled in the art for anyarbitrary N-way associate cache. As is well known in the art, N-Way setassociative caches divides cache into sets of N memory locations. Eachmemory location of main memory is mapped to one of the sets and could belocated in any of the N location of the set in the cache. For clarity,each member of a set is called a “way” herein. For further clarity,unless otherwise stated, the caches described herein are for 32 bit (onewhole word) systems and are described using 16 bit half words. Otherembodiments of the present invention can use larger or smaller datawords.

The memory locations in the memory towers are described by half wordHW_X_Y, where X is the cache line of the half word and Y is the locationof the half word within the cache line. Each cache line for theembodiment of FIG. 2 contains 8 half words. Thus for example, cache line0 includes half words HW_(—)0_(—)0, HW_(—)0_(—)1, HW_(—)0_(—)2,HW_(—)0_(—)3, HW_(—)0_(—)4, HW_(—)0_(—)5, HW_(—)0_(—)6, andHW_(—)0_(—)7. Cache lines 0 and 1 form 1 set, cache lines 2 and 3 form asecond set, and in general cache line X and X+1 form a set. Thus, asillustrated in FIG. 2, cache unit 200, stores cache line 0, cache line2, and in general cache line X, where X mod 2 is equal to 0 in memorytower 220_T0. Thus, way 0 of each set is stored in memory tower 220_T0.Conversely, cache unit 200 stores cache line 1, cache line 3, and cachelines X, where X mod 2 equals 1 in memory tower 220_T1. Thus, way 1 ofeach set is stored in memory tower 220_T1.

As illustrated in FIG. 2 cache line 0 occupies four physical lines ofmemory tower 220_T0. In general only one physical line of memory tower220_T0 may be active at a time. For aligned accesses, a half word HW_X_Yand a half word HW_X_Y+1 is read simultaneously, where Y MOD 2 is equalto 0. In general, half words HW_X+1_Y and HW_X+1_Y+1 would also be readat the same time because both ways are read simultaneously. Thus, thearrangement of half words in FIG. 2 with performs well for alignedaccesses. Specifically, a CPU (not shown) access cache unit 200 byproviding an address ADDR for the desired memory access to tag unit 210,memory tower 220_T0, and memory tower 220_T1. Tag unit 210 determineswhether address ADDR is cached in cache unit 200. If address ADDR is incache unit 200, tag unit 210 drives hit signal HIT to a HIT logic level(typically logic high) to indicate that address ADDR is in cache unit200. Tag unit 210 also controls multiplexer to select either way 0 frommemory tower 220_T0 or way 1 from memory tower 220_T1 to connect to databus DATA.

The architecture of conventional caches such as cache unit 200 does notsupport unaligned access. For aligned access, a first half word HW_X_Yand a second half word HW_X_Y+1, where Y can be any number from 0 to 6may be read. For example an unaligned access may read half wordHW_(—)0_(—)1 and half word HW_(—)0_(—)2 simultaneously. However, halfword HW_(—)0_(—)1 and half word HW_(—)0_(—)2 are on separate physicallines of memory tower 220_T0 and thus cannot be accessed simultaneously.Consequently, two memory accesses are necessary and memory performanceis greatly degraded. Some conventional caches provide unaligned accessby widening each memory tower so that each physical line of the memorytower has the same width as a cache line. Thus, all half words would beaccessible simultaneously to allow unaligned access. However, theloading and propagation delay for the selection of the appropriate datais unsuitable for a timing critical system such as a cache unit.Furthermore, the silicon area and the number sense amps required toimplement such a wide cache line would be cost prohibitive.

However, using the novel cache architecture of the present invention,unaligned access can be supported within a cache line with only minimaladditional overhead. FIG. 3 is a block diagram of a cache unit 300 inaccordance with one embodiment of the present invention. Cache unit 300includes a control unit 305, a tag unit 310, a memory tower 320_T0, amemory tower 320_T1, way multiplexers 330_T1 and 330_T0, and a dataaligner 340. In some embodiments of the present invention data aligner340 is part of the central processing unit rather than cache unit 300.Each cache line for the embodiment of FIG. 3 contains 8 half words. Thusfor example, cache line 0 includes half words HW_(—)0_(—)0,HW_(—)0_(—)1, HW_(—)0_(—)2, HW_(—)0_(—)3, HW_(—)0_(—)4, HW_(—)0_(—)5,HW_(—)0_(—)6, and HW_(—)0_(—)7.

Unlike conventional cache units, cache unit 300 stores multiple ways ofdifferent cache lines in the same memory tower. Furthermore, a singlecache line is divided across multiple towers in even and odd half words.In addition, physical lines of the memory towers are shared by multiplecache lines. For clarity, the first four physical lines of memory tower320_T0 are referenced as physical lines 320_T0 _(—)0, 320_T0 _(—)1,320_T0 _(—)2, and 320_T0 _(—)3. These four physical lines correspond toone logical cache line and one tag entry. Similarly, the first fourphysical lines of memory tower 320_T1 are referenced as physical lines320_T1 _(—)0, 320_T1 _(—)1, 320_T1 _(—)2, and 320_T1 _(—)3.

Cache line 0 is stored in both memory tower 320_T0 and 320_T1.Specifically, half words HW_(—)0_(—)0, HW_(—)0_(—)2, HW_(—)0_(—)4, andHW_(—)0_(—)6 are stored on physical lines 320_T0 _(—)0, 320_T0 _(—)1,320_T0 _(—)2, and 320_T0 _(—)3, respectively. Conversely half wordsHW_(—)0_(—)1, HW_(—)0_(—)3, HW_(—)0_(—)5, and HW_(—)0_(—)7 are stored inphysical lines 320_T1 _(—)0, 320_T1 _(—)1, 320_T1 _(—)2, and 320_T1_(—)3. Cache line 0 shares the physical lines of memory tower 320_T1 and320_T0 with cache line 1. Specifically, half word HW_(—)1_Y shares aphysical line with half word HW_(—)0_Y, where Y is an integer from 0 to7, inclusive. Note that half words where Y is even are all located inmemory tower 320_T0 and half words where Y is odd are all located inmemory tower 320_T1.

Memory towers 320_T1 and 320_T2 are controlled independently by controlunit 305. Thus, a different physical lines of memory tower 320_T1 andmemory tower 320_T2 may be active at the same time. The arrangement ofthe half-words described above combined with the ability to access thememory towers independently allows unaligned access within a cache line.For example to handle a cache access requesting half words HW_(—)0_(—)1and HW_(—)0_(—)2, control unit 305 simultaneously activates physicalline 320_T1 _(—)0 of memory tower 320_T1 and physical line 320_T0 _(—)1of memory tower 320_T0. Way multiplexers 330_T1 and 330_T0 areconfigured to pass half word HW_(—)0_(—)1 and half word HW_(—)0_(—)2,respectively, to data aligner 340. Data aligner would realign half wordHW_(—)0_(—)1 and half word HW_(—)0_(—)2 to the appropriate order andprovide the data on data bus DATA. Another benefit of this arrangementis reduced power consumption for memory accesses that use a single halfword. Specifically, since multiple ways of different cache lines are inthe same memory tower, only that memory tower needs to be activated on ahalf-word access.

In actual operation, a CPU (not shown) makes a memory request with anaddress ADDR. Control unit 305 activates the appropriate physical linesof memory tower 320_T1 and memory tower 320_T0. The higher order bits ofaddress ADDR determine which logical cache line is used and the lowerorder bits determine which physical lines are activated. Specifically,the lowest 3 bits would indicate which half word begins the addresseddata. Table 1 shows which physical lines within a set would be addressedbased on the lower three bits of the address. TABLE 1 Lower 3 AddressBits Tower 320 T1 Tower 320 T-0 b′000 0 0 b′001 0 1 b′010 1 1 b′011 1 2b′100 2 2 b′101 2 3 b′110 3 3 b′111 3 0

Tag unit 310 determines whether the requested address is located withincache unit 300. If address ADDR is located in cache unit 310, tag unit310 drives cache hit signal HIT to an active logic level (generallylogic high)and controls way multiplexers 330_T1 and 330_T0. Data aligner340 is configured by the low order bits of address ADDR to determine howthe data from the way multiplexers should be realigned.

FIG. 4 is a block diagram of tag unit 310. Tag unit 310 includes N taglines TL[0] to TL[N−1] and a tag comparator 410. Each tag line includesan age bit AB, a first bit field BF[0], and a second bit field BF[1].Each bit field includes a valid bit VB, a dirty bit DB, and a tagaddress T_ADDR. Valid bit VB indicates whether the data in thecorresponding cache memory location is valid. Dirty bit DB indicateswhether the data in corresponding cache memory location is newer thanthe corresponding memory location in the main memory. Other embodimentsof the present invention include additional status and control bits. Forexample in one embodiment of the present invention, a lock bit can beset to insure that the corresponding entry remains in the cache. If thetag line is used for a different memory location, the current data mustbe written to main memory if the dirty bit is set to maintain cachecoherence. Age bit AB indicates whether bit field BF[0] or bit fieldBF[1] contains older data. When a tag line must be reused and both bitfield BF[0] and bit field BF[1] correspond to valid data (as indicatedby the valid bit), the bit field that is older is reused. Otherembodiments of the present invention may use other methods fordetermining which lines are reused. Tag comparator 410 compares aportion of incoming address ADDR with the tag addresses TAG_ADDR of theset of cache lines corresponding to address ADDR to determine whether acache hit occurs. Other embodiments of the present invention may useother methods to determine cache hits. When a Cache hit occurs tagcomparator 410 drives hit signal HIT to a hit logic level (i.e. logichigh) otherwise tag comparator 410 drives hit signal HIT to a miss logiclevel (i.e. logic low). Furthermore, when a cache hit occurs, tagcomparator 410 drives cache way multiplexer control signal CWM tocontrol way multiplexers 330T1 and 330_T0 to provide the appropriatedata to data aligner 340. Specifically, for the embodiment in FIG. 2,when way 1 is needed, tag unit 310 drives cache way multiplexer controlsignal CWM to logic high, and when way 0 is needed tag unit 310 drivescache way multiplexer control signal CWM to logic low.

FIG. 5 is a block diagram of an embodiment of data aligner 340. Dataaligner 340 includes a multiplexer 510 and a multiplexer 520. The outputdata from way multiplexer 310_T1 are applied to the logic high inputport of multiplexer 520 and the logic low input port of multiplexer 510.The output data from way multiplexer 310_T0 are applied to the logichigh input port of multiplexer 510 and the logic low input port ofmultiplexer 520. Address bit ADDR[0] controls multiplexer 510 and 520.Thus for example, if the CPU wishes to read an unaligned data word ofhalf words HW_(—)1_(—)3 and HW_(—)1_(—)4, control unit 305 causes memorytower 320_T1 to output the contents of physical line 320_T1 _(—)1 andcauses memory tower 320_T0 to output physical line 320_T0 _(—)2. Cacheway multiplexer control signal CWM would be at logic high because way 1is being selected. Thus, data aligner 30 receives half word HW_(—)1_(—)3from way multiplexer 330_T1 and half word HW_(—)1_(—)4 from waymultiplexer 330_T0 The data aligner would realign the data somultiplexer 510 outputs half word HW_(—)1_(—)4 and multiplexer 520outputs half word HW_(—)1_(—)3.

FIG. 6(a) illustrates the M memory towers of a N-way set associativecache 600 for a system supporting unaligned access at M data segment.For example, M would be 4 for a 64 bit system supporting unalignedaccess on 16 bit boundaries. As illustrated in FIG. 6(b), a memory towerMT_X is divided into N way sub-towers (WST_(—)0, WST_(—)1, . . .WST_N−1). Furthermore, as illustrated in FIG. 6(c) each memory sub-toweris further divided into S sets (SET_(—)0, SET_(—)1, . . . SET_S−1). Asillustrated in FIG. 6(d), a set SET_X includes P physical memory linesPL_(—)0, PL_(—)1, . . . PL_P. Although FIG. 6(d) only shows one set, thephysical line extends through all the way sub-towers. For generality,instead of half words, FIG. 6 uses the notation data segment D_X_Y_Z,where X is the set number, Y is the way number, and Z is the datasegment number. A cache line is divided across all the memory towers.Within each memory tower the data segments of a cache line are stored ina single way tower. However, each segment of the cache line is stored ona separate physical line within a set in the way tower. Specifically, acache line includes a set of sequential data segments, each of the firstM successive data segment is placed in a different memory towers. The(M+x)th data segment goes into the same memory tower as the xth datasegment.

Specifically, a cache line would have the data segments D_X_Y_(—)0 toD_X_Y_(P−1)*M+M−1. Physical line PL of set X of way sub-tower Y, ofmemory tower MT would contain data segment D_X_Y_Z where Z is calculatedas PL*M+MT. Thus, for example in a 2 way set associative cache having a64-bit wide interface and with access being alignable at each half-word(thus each tower is 16 bits wide) and having four physical lines percache line, N is equal to 2, P is equal to 4 and M is equal to 4. Thus,a cache line from set X and way Y would contain data segments(D_X_Y_(—)0, D_X_Y_(—)1, D_X_Y_(—)2, . . . D_X_Y_(—)14, D_X_Y_(—)15.FIG. 7 illustrates how the cache lines from sets 0 and 1 and ways 0 to 4would be arranged in the 4 memory towers. As explained above, datasegment D_X_Y_PL*M+MT, is on physical line PL, of set X, in waysub-tower Y, of memory tower MT.

In the various embodiments of this invention, novel structures andmethods have been described to provide unaligned access to a cache. Byusing a multi-towered caches having independent addressing and splittingthe cache lines across multiple towers and combining corresponding datasegments from different ways into one tower, the CPU of a microprocessorsystems in accordance with the present invention can access the cache inan unaligned manner, while using a minimal interface size equal to theaccess width multiplied by the number of ways. The smaller interfacesize reduces overhead as compared to conventional caches that require aninterface size equal to the size of the cache line. Furthermore, powerconsumption can be reduced on partial accesses and overhead is reducedfor accessing all ways of the full logical bandwidth. The variousembodiments of the structures and methods of this invention that aredescribed above are illustrative only of the principles of thisinvention and are not intended to limit the scope of the invention tothe particular embodiments described. For example, in view of thisdisclosure, those skilled in the art can define other memory systems,memory towers, tag units, way sub-towers, sets, multiplexers, dataaligners, and so forth, and use these alternative features to create amethod or system according to the principles of this invention. Thus,the invention is limited only by the following claims.

1. A cache unit comprising: a first memory tower, having a first waysub-tower and a second way sub-tower; a second memory tower, having afirst way sub-tower and a second way sub-tower; and wherein a firstcache line of the cache unit includes a first plurality of data segmentsin the first way sub-tower of the first memory tower and a secondplurality of data segments in the first way sub-tower of the secondmemory tower.
 2. The cache unit of claim 1, wherein the first cache linecomprises sequential data segments; the first plurality of data segmentsincludes a first data segment and a third data segment; and the secondplurality of data segments includes a second data segment and a fourthdata segment.
 3. The cache unit of claim 1, wherein a second cache lineof the cache unit includes a first plurality of data segments in thesecond way sub-tower of the first memory tower and a second plurality ofdata segments in the second way sub-tower of the second memory tower. 4.The cache unit of claim 3, wherein a physical line of the first memorytower includes data segments from the first cache line and the secondcache line.
 5. The cache unit of claim 1, further comprising a first waymultiplexer having a first input port coupled to the first way sub-towerof the first memory tower, a second input port coupled to the first waysub-tower of the first memory port; and an output port.
 6. The cacheunit of claim 5, further comprising a second way multiplexer having afirst input port coupled to the first way sub-tower of the second memorytower, a second input port coupled to the first way sub-tower of thesecond memory port; and an output port.
 7. The cache unit of claim 5,further comprising a tag unit coupled to control the first waymultiplexer and the second way multiplexer.
 8. The cache unit of claim7, wherein the tag unit is configured to determine whether a memoryaddress is cached by the cache unit.
 9. The cache unit of claim 5,further comprising a data aligner coupled to the output port of thefirst way multiplexer and the output port of the second way multiplexer.10. The cache unit of claim 1, wherein the first memory tower furthercomprises a third way sub-tower and a fourth way sub-tower.
 11. Thecache unit of claim 1, further comprising a third memory tower and afourth memory tower.
 12. The cache unit of claim 11, wherein the firstcache line includes a third plurality of data segments in the thirdmemory tower and a fourth plurality of data segments in the fourthmemory tower.
 13. A method of operating a cache unit having a firstmemory tower and a second memory tower, the method comprising: storing afirst plurality of data segments of a first cache line in a first waysub-tower of the first memory tower; storing a second plurality of datasegments of the first cache line in a first way sub-tower of the secondmemory tower; storing a first plurality of data segments of a secondcache line in a second way sub-tower of the first memory tower; andstoring a second plurality of data segments of the second cache line ina second way sub-tower of the second memory tower.
 14. The method ofclaim 13, further comprising: activating a first physical line of thefirst memory tower, wherein the first physical line includes datasegments from the first cache line and the second cache line; andactivating a second physical line of the second memory tower, whereinthe second physical line of the second memory tower includes datasegments from the first cache line and the second cache line.
 15. Themethod of claim 14, wherein the first physical line of the first memorytower has a different address than the second physical line of thesecond memory tower.
 16. The method of claim 15, wherein a first datasegment of the first cache line is in the first way sub-tower of thefirst memory tower and a second data segment of the first cache line isin the first way sub-tower of the second memory tower and wherein thesecond data segment is adjacent the first data segment.
 17. The methodof claim 15, wherein the first data segment is in the first physicalline of the first memory tower and the second data segment is in thesecond physical line of the second memory tower.
 18. The method of claim17, further comprising realigning the first data segment and the seconddata segment.
 19. A cache unit having a first memory tower and a secondmemory tower, comprising: means for storing a first plurality of datasegments of a first cache line in a first way sub-tower of the firstmemory tower; means for storing a second plurality of data segments ofthe first cache line in a first way sub-tower of the second memorytower; means for storing a first plurality of data segments of a secondcache line in a second way sub-tower of the first memory tower; andmeans for storing a second plurality of data segments of the secondcache line in a second way sub-tower of the second memory tower.
 20. Thecache unit of claim 19, further comprising: means for activating a firstphysical line of the first memory tower, wherein the first physical lineincludes data segments from the first cache line and the second cacheline; and means for activating a second physical line of the secondmemory tower, wherein the second physical line of the second memorytower includes data segments from the first cache line and the secondcache line.
 21. The cache unit of claim 20, wherein the first physicalline of the first memory tower has a different address than the secondphysical line of the second memory tower.
 22. The cache unit of claim21, wherein a first data segment of the first cache line is in the firstway sub-tower of the first memory tower and a second data segment of thefirst cache line is in the first way sub-tower of the second memorytower and wherein the second data segment is adjacent the first datasegment.
 23. The method of claim 21, wherein the first data segment isin the first physical line of the first memory tower and the second datasegment is in the second physical line of the second memory tower. 24.The cache unit of claim 23, further comprising means for realigning thefirst data segment and the second data segment.